Introduction to Kaggle Kernels
Schematic of Kaggle Kernel
Fig. shows the schematic of Kaggle Kernel.The coding, analysis, and collaboration product are called Kaggle kernel on 7/8/2016.
Kaggle Kernels are a combination of environment, input, code, and output, all stored together for every version you create.
By storing all of these attributes together Kernels are fundamentally reproducible, easy to share, and easy to learn from or fork.
Kaggle Kernels is a free platform to run Jupyter notebooks in the browser.
The processing power for the Jupyter notebook comes from servers up in the clouds, not your local machine.
You can do a lot of data science and machine learning without heating up your laptop.
Kaggle provide compute power, memory and computing length time (4 CPUs, 16 GB RAM, 1GB Disk space, 60 min execution time).
Kaggle Kernel operation overview
Once we create an account at Kaggle.com, we can choose a dataset that we want to play with and spin up a new kernel or notebook in just few clicks.
Overview of Kernel feature
Fig. shows the kaggle kernel overview, where four sheet are provided in kernel webpage. The kernels are characterized with types, category, output and language.
Scripts and Notebook are two types of kernel.
The scripts type is ideal for fitting a model, competition submission.
Create a Notebook Kernel
click ``sign in” to enter Kaggle account.
Fig. shows the website screen of Kaggle entry.
click ``Kaggle” to enter kaggle kernel platform.
Fig. shows the upper part of Kaggle website.
click ``New Kernel” to create a Kaggle kernel platform.
Fig. shows the upper part of Kaggle Kernel website.
choose Script or Notebook. In this case, the Notebook is chosen.
Fig. shows the options between Script and Notebook when starting a new kernel.
In Script type, all the codes run every time, and this Script type is ideal for fitting a model and competition submissions.
In Notebook type, cells of code and markdown runs individually, and this Notebook type is ideal for interactive data exploration and polished analysis. The Notebook type can share insights through code and commentary.Enter a title for the notebooks, and add a data source
Fig. shows the web screen after we adopt Notebook type.
chose Fashion MNIST Kernel in Dataset category.
Fig. shows website screen after click on add a Data source.
Three categories: Datasets, Competitions, and Kernels are available. In this case, the Fashion MNIST in Dataset category is adopted.
Notebook default setting
Fig. shows the Notebook type default setting.
Cell of codes can run individually.
Fig. shows the cells of codes, and they can run individually.
there’s no need to deal with pushing a dataset into the machine and waiting for large datasets to copy over a network.
click Input Files icon
Fig. shows the Input Files in Notebook type screen.
Data files screen after clicking ``Input Files” icon.
Fig. shows the data files, after clicking ``Input Files” icon.
fashion-mnistdataset contains 10 categories of clothing and accessory types (like pants, bags, heels, shirts, and so on). There are 50k training samples, and 10k evaluation samples.
The dataset is provided on Kaggle in the form of csv files. The original data was 28x28 pixel grayscale images, and they’ve been flattened to become 784 distinct columns in the csv file. The file also contains a column representing the index, 0 through 9, of the fashion item.
Upload Datasets
you can still load additional files (up to 1GB) into the kernel if you want to.
[1]
http://blog.kaggle.com/2016/07/08/kaggle-kernel-a-new-name-for-scripts/
[2]
https://towardsdatascience.com/introduction-to-kaggle-kernels-2ad754ebf77